1,875 research outputs found
Authorship attribution using co-occurrence networks
Atribuição de Autoria utlizando Redes de
Co-Ocorrencia
Nesta tese é abordada a tarefa de Atribuição de Autoria como uma tarefa de classificação. As metodologias
utilizadas representam textos em grafos. Destes, várias medidas são extraÃdas, sendo utilizadas como
amostras para o classificador. Já existem alguns trabalhos que também se focam nesta metodologia. Esta
tese foca-se num método que divide o texto em várias partes e trata cada uma como um grafo. Deste, são
extraÃdas as medidas, que são tratadas como uma série temporal, da qual são extraÃdos momentos. Assim,
os momentos compõem o vetor final, representativo de todo o texto. A partir da metodologia aqui descrita
surgem mais duas variações. A primeira variação omite o passo das séries temporais, e, por consequência,
as várias medidas de cada grafo são utilizadas diretamente como amostras. A segunda variação representa
todo o texto como um só grafo. As metodologias são testadas com corpus em Inglês e Português, com
número variado de textos; Abstract:
Authorship Attribution using Co-Occurrence
Networks
This thesis approaches the task of Authorship Attribution as a classification task. This is done using
methodologies that represent text documents in graphs, from which several measures are extracted, to be
used as samples for the classifier. There have been some works that also focus on this methodology. This
thesis focuses on a methodology which splits the texts in multiple parts and treats each as a separate graph,
from which measures are extracted. Each graph’s measures are treated as a time-series and moments are
extracted. These moments make the final vector, representative of the entire text. This methodology is
explored and extended with 2 variations. The first variation skips the time-series step, resulting in the
various measures from each graph being used directly as samples. The second variation models the entire
text as one graph. The methodologies are tested in corpus in both English and Portuguese, with varying
number of texts
mCSM-AB: a web server for predicting antibody-antigen affinity changes upon mutation with graph-based signatures.
Computational methods have traditionally struggled to predict the effect of mutations in antibody-antigen complexes on binding affinity. This has limited their usefulness during antibody engineering and development, and their ability to predict biologically relevant escape mutations. Here we present mCSM-AB, a user-friendly web server for accurately predicting antibody-antigen affinity changes upon mutation which relies on graph-based signatures. We show that mCSM-AB performs better than comparable methods that have been previously used for antibody engineering. mCSM-AB web server is available at http://structure.bioc.cam.ac.uk/mcsm_ab.This is the final published version. It first appeared at http://nar.oxfordjournals.org/content/early/2016/05/23/nar.gkw458.full
Recommended from our members
DUET: a server for predicting effects of mutations on protein stability using an integrated computational approach.
Cancer genome and other sequencing initiatives are generating extensive data on non-synonymous single nucleotide polymorphisms (nsSNPs) in human and other genomes. In order to understand the impacts of nsSNPs on the structure and function of the proteome, as well as to guide protein engineering, accurate in silicomethodologies are required to study and predict their effects on protein stability. Despite the diversity of available computational methods in the literature, none has proven accurate and dependable on its own under all scenarios where mutation analysis is required. Here we present DUET, a web server for an integrated computational approach to study missense mutations in proteins. DUET consolidates two complementary approaches (mCSM and SDM) in a consensus prediction, obtained by combining the results of the separate methods in an optimized predictor using Support Vector Machines (SVM). We demonstrate that the proposed method improves overall accuracy of the predictions in comparison with either method individually and performs as well as or better than similar methods. The DUET web server is freely and openly available at http://structure.bioc.cam.ac.uk/duet
A general theory to estimate Information transfer in nonlinear systems
A general theory for computing information transfers in nonlinear systems
driven by deterministic forcings and additive and/or multiplicative noises, is
presented. It extends the Liang-Kleeman framework of causality inference based
on information transfer across system variables (Liang, 2016). An effective
method of computing formulas of the rates of entropy transfers (RETs) is
presented, the Causal Sensitivity Method (CSM), relying on the estimation from
data of conditional expectations. Those expectations are approximated by
nonlinear regressions, leading to a much easier and more robust way of
computing RETs than the brute-force approach calling for numerical integrals
over the phase space and the knowledge of the multivariate probability density
function of the system. The CSM is furthermore fully adapted to the case where
no model equations are available, starting with a nonlinear model fitting from
data with the subsequent application of CSM to the fitted model. Moreover, the
RETs are decomposed into sums of single one-to-one RETs plus synergetic terms,
accounting for the joint causal effect of groups of variables. State-dependent
RET formulas are also proposed, allowing for determining the dependencies of
variables and synergies locally in phase space.
A comparison of the RETs estimations is performed between the brute-force
probability-density-based approach (AN), the CSM-based approach with and/or
without model fitting, and the multivariate linear approach, in the context of
two models: (i) a model derived from a potential and (ii) the classical chaotic
Lorenz system, both forced by additive and/or multiplicative noises. The
analysis demonstrates that the CSM estimations are robust and close to the
AN-reference values in the different experiments, providing evidence of the
possibilities offered by the method and opening new perspectives on real-world
applications.Comment: 41 pages, 6 figures. Submitted to Physica
CYCLOSA: Decentralizing Private Web Search Through SGX-Based Browser Extensions
By regularly querying Web search engines, users (unconsciously) disclose
large amounts of their personal data as part of their search queries, among
which some might reveal sensitive information (e.g. health issues, sexual,
political or religious preferences). Several solutions exist to allow users
querying search engines while improving privacy protection. However, these
solutions suffer from a number of limitations: some are subject to user
re-identification attacks, while others lack scalability or are unable to
provide accurate results. This paper presents CYCLOSA, a secure, scalable and
accurate private Web search solution. CYCLOSA improves security by relying on
trusted execution environments (TEEs) as provided by Intel SGX. Further,
CYCLOSA proposes a novel adaptive privacy protection solution that reduces the
risk of user re- identification. CYCLOSA sends fake queries to the search
engine and dynamically adapts their count according to the sensitivity of the
user query. In addition, CYCLOSA meets scalability as it is fully
decentralized, spreading the load for distributing fake queries among other
nodes. Finally, CYCLOSA achieves accuracy of Web search as it handles the real
query and the fake queries separately, in contrast to other existing solutions
that mix fake and real query results
Adapting Pretrained Language Models for Solving Tabular Prediction Problems in the Electronic Health Record
We propose an approach for adapting the DeBERTa model for electronic health
record (EHR) tasks using domain adaptation. We pretrain a small DeBERTa model
on a dataset consisting of MIMIC-III discharge summaries, clinical notes,
radiology reports, and PubMed abstracts. We compare this model's performance
with a DeBERTa model pre-trained on clinical texts from our institutional EHR
(MeDeBERTa) and an XGBoost model. We evaluate performance on three benchmark
tasks for emergency department outcomes using the MIMIC-IV-ED dataset. We
preprocess the data to convert it into text format and generate four versions
of the original datasets to compare data processing and data inclusion. The
results show that our proposed approach outperforms the alternative models on
two of three tasks (p<0.001) and matches performance on the third task, with
the use of descriptive columns improving performance over the original column
names
Biological interactions between nematophagous fungi, Esteya spp., and the pinewood nematode, Bursaphelenchus xylophilus
The pinewood nematode (PWN), Bursaphelenchus xylophilus, is a quarantine organism in several countries and the causal agent of pine wilt disease (PWD), a serious threat to pine forests worldwide. PWD results from complex interactions between the nematode, its insect vector, Monochamus spp., and host plants (conifers), being the nematode the common element in this interaction. The PWN is considered the sixth most economically important plant-parasitic nematode. In Europe, this pest was first reported in Portugal in 1999, in maritime pine, Pinus pinaster. Due to its economic importance and worldwide distribution, an enormous amount of effort is devoted to research on B. xylophilus and PWD. Scenarios strongly suggest that climate change is likely going to cause a spread of PWD and outbreaks in areas free of the disease. The urgent need for sustainable management strategies has led to an increasing interest in antagonists capable of suppressing the PWN. Nematophagous fungi belonging to the Esteya genus are reported as natural enemies of the PWN and promising biocontrol agents. There are currently two described species: E. vermicola and E. floridanum, the first of which is capable of mimicking volatile organic compounds produced naturally by Pinus spp. in order to attract PWN. However, few studies have been carried out on the development of Esteya spp. inside pine trees, and none using maritime pine, the main and most affected species in Portuguese forests and its largest carbon reservoir. It is therefore crucial to understand the plant-nematode-fungus interactions between P. pinaster, B. xylophilus and Esteya spp. In this sense, biological interactions between these two antagonists, the PWN and P. pinaster were investigated, namely fungus-fungus, fungus-nematode and fungus-tree, as well as feeding trials and chemotaxis assays, to determine the attractive power of both fungal species. These results will enlighten us on the most promising species for biocontrol and help us devise new ways to manage PWD
- …